Weighted determinization and minimization for large vocabulary speech recognition

نویسندگان

  • Mehryar Mohri
  • Michael Riley
چکیده

ABSTRACT Speech recognition requires solving many space and time problems that can have a critical effect on the overall system performance. We describe the use of two general new algorithms [5] that transform recognition networks into equivalent ones that require much less time and space in large-vocabulary speech recognition. The new algorithms generalize classical automata determinization and minimization to deal properly with the probabilities of alternative hypotheses and with the relationships between units (distributions, phones, words) at different levels in the recognition system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Network Optimizations for Large VocabularySpeech

The redundancy and the size of networks in large-vocabulary speech recognition systems can have a critical eeect on their overall performance. We describe the use of two new algorithms: weighted determinization and minimization 12]. These algorithms transform recognition labeled networks into equivalent ones that require much less time and space in large-vocabulary speech recognition. They are ...

متن کامل

A tail-sharing WFST composition algorithm for large vocabulary speech recognition

This paper presents an algorithm for approximating minimization in the context of the weighted finite-state transducers approach to large vocabulary speech recognition. The algorithm is designed for the integration of the lexicon with the language model and performs composition, determinization and pushing in one step. Furthermore, it uses tail-sharing in order to approximate minimization. Our ...

متن کامل

Speech Recognition with Weighted Finite-state Transducers

This chapter describes a general representation and algorithmic framework for speech recognition based on weighted finite-state transducers. These transducers provide a common and natural representation for major components of speech recognition systems, including hidden Markov models (HMMs), context-dependency models, pronunciation dictionaries, statistical grammars, and word or phone lattices...

متن کامل

An optimal pre-determinization algorithm for weighted transducers

We present a general algorithm, pre-determinization, that makes an arbitrary weighted transducer over the tropical semiring or an arbitrary unambiguous weighted transducer over a cancellative commutative semiring determinizable by inserting in it transitions labeled with special symbols. After determinization, the special symbols can be removed or replaced with -transitions. The resulting trans...

متن کامل

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997